Improving precision in concept normalization.

نویسندگان

  • Mayla Boguslav
  • K Bretonnel Cohen
  • William A Baumgartner
  • Lawrence E Hunter
چکیده

Most natural language processing applications exhibit a trade-off between precision and recall. In some use cases for natural language processing, there are reasons to prefer to tilt that trade-off toward high precision. Relying on the Zipfian distribution of false positive results, we describe a strategy for increasing precision, using a variety of both pre-processing and post-processing methods. They draw on both knowledge-based and frequentist approaches to modeling language. Based on an existing high-performance biomedical concept recognition pipeline and a previously published manually annotated corpus, we apply this hybrid rationalist/empiricist strategy to concept normalization for eight different ontologies. Which approaches did and did not improve precision varied widely between the ontologies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Search and Retrieval Performance through Shortening Documents, Detecting Garbage, and Throwing Out Jargon

This thesis describes the development of a new search and retrieval system used to index and process queries for several different data sets of documents. This thesis also describes my work with the TREC Legal data set, in particular, the new algorithms I designed to improve recall and precision rates in the legal domain. I have applied novel normalization techniques that are designed to slight...

متن کامل

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditio...

متن کامل

Improving Term Frequency Normalization for Multi-topical Documents and Application to Language Modeling Approaches

Term frequency normalization is a serious issue since lengths of documents are various. Generally, documents become long due to two different reasons verbosity and multi-topicality. First, verbosity means that the same topic is repeatedly mentioned by terms related to the topic, so that term frequency is more increased than the well-summarized one. Second, multi-topicality indicates that a docu...

متن کامل

Integrated cTAKES for Concept Mention Detection and Normalization

We participated Task 1 using an existing system MedTagger implemented in integrated cTAKES (icTAKES). The concept mention detection is based on Conditional Random Fields (CRF) and the concept mention normalization is based on a greedy dictionary lookup algorithm. A distinctive feature in MedTagger compared to other concept mention detection systems is the incorporation of dictionary lookup resu...

متن کامل

A survey on the comparison between precision and traditional agriculture by budgeting method

The present study was conducted to compare precision and traditional agriculture by budgeting technique. Its statistical population consists of 210 experts in agricultural jihad organization of Qom province. The validity of Questionnaire as research tool ware confirmed by professors while its reliability was corroborated by Cranach’s alpha to 0.78-0.94 intervals. According to the findings, ther...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2018